Uncovering the topology of configuration space networks 
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The configuration space network (CSN) of a dynamical system is an effective approach 
to represent the ensemble of configurations sampled during a simulation and their dynamic 
connectivity. To elucidate the connection between the CSN topology and the underlying 
free-energy landscape governing the system dynamics and thermodynamics, an analytical 
solution is provided to explain the heavy tail of the degree distribution, neighbor connectivity 
and clustering coefficient. This derivation allows to understand the universal CSN network 
topology observed in systems ranging from a simple quadratic well to the native state of 
the beta3s peptide and a 2D lattice heteropolymer. Moreover CSN are shown to fall in the 
general class of complex networks described by the fitness model. 

PACS numbers: 89.75.Fb, 89.75.Da, 87.15.He 



I. INTRODUCTION 



The use of complex networks and graph theory to describe complex systems ranging from the 
WWW to protein interaction networks is by now well-established (different book and reviews 
are available about this topic A large class of these systems attains complexity 

by means of their internal dynamics (J], which is often revealed by computer simulations. One 
example of such systems is the folding of proteins for which simulations have been extensively 
used in structural biology. Nowadays several Molecular Dynamics (MD) softwares (CHARMM [7], 
GROMACS [8], AMBER fl) are available to probe in real time the dynamics of folding/ unfolding. 
However, because of the large number of degrees of freedom involved in the process, the results 
of MD simulations form themselves a highly complex system. As a consequence a detailed, unbiased 
description of the free-energy landscape underlying the thermodynamics and kinetics cannot not 
easily be extracted. 

To tackle this complexity, new approaches based on complex networks have been recently intro- 
duced showing that a network description is effective for the analysis and visualization of simulation 
results. In [ll|] for instance, the topology of the configurations of a short lattice polymer has been 
mapped onto a network. Doye and Massen have also applied graph analysis to study the organiza- 
tion of the potential energy minima in a Lennard- Jones cluster of atoms [12I [r|. In another work 
the concept of disconnectivity graphs has been used to analyze the free-energy of a tetrapeptide 
and a /3-hairpin [lj], [if]]. Finally the free-energy landscape of a three-stranded /3-sheet (beta3s) 
and alanine dipeptide sampled by MD simulations have been represented as a configuration space 
network (CSN) 

Given the time evolution of a dynamical system, the CSN represents the ensemble of micro-states 
(configurations) sampled during a simulation, and their dynamic connectivity. In this representa- 
tion, nodes are system configurations and links are direct transitions between the configurations 
sampled during the simulation. The CSN topology shares several features with other networks 



representing systems as different as cell function [18j, scientific collaborations [19] and the WWW 
20]. In particular it has been shown [r| that the degree distribution of the beta3s peptide CSN ex- 



hibits a heavy tail well approximated by a power-law and a disassortative behavior for the average 
neighbor connectivity distribution. Moreover the clustering coefficient presents a decay compatible 
with a 1/k function for large values of the degree, which has been interpreted as the presence of a 



hierarchical organization of the nodes 



211 ] . Recently, the connection between the CSN topological 



clusters and free-energy basins has been explored and an analytical solution for the node weight 



distribution observed in CSNs has been provided with the help of simple energy landscape models 
\y\ . Following these lines of research, the challenge is now to find the connection between net- 
work topology, system dynamics and free-energy landscape organization. This work focuses on the 
degree distribution P(k), the average neighbor connectivity K nn (k) and the clustering coefficient 



C(k) observed in CSNs. Several studies 221. l23l. \24. 1 251] have shown that the analysis of the above 



three distributions is an important step towards the understanding of the network organization 
and architecture. The results presented below provide the first rationale for the origin of several 
unexplained properties of CSNs. 

The paper is organized as follows. Section II describes in detail how CSNs are built. Section 
III shows how the degree distribution, the average neighbor degree and the clustering coefficient 
relate to the free-energy landscape. In section IV an analytical derivation and simulation results 
are presented for the quadratic well model. Then the CSNs obtained from beta3s peptide and 
lattice heteropolymer simulations are analyzed in Section V. Finally, the connection between CSNs 
and the fitness model is discussed in Section VI and conclusions are presented in Section VII. 

II. CONFIGURATION SPACE NETWORKS 

The simulation of a dynamical system, like a peptide or a protein, results in a time series 
of snapshots representing the dynamics. The CSN of this kind of processes gives a synthetic 
view of the configurations and transitions observed during the simulation. System configurations 
are the nodes and a link is placed between two nodes if they appear consecutively in the time 
series. The time step between two snapshots t s (usually called configuration saving time) is a 
free parameter: t s = itM, where it is the integration time step for the simulation and M is the 
number of microscopic steps between two snapshots. When M approaches 1, only configurations 
spatially close to each other are connected together. Therefore a link is a temporal relation between 
configurations and changing M changes the set of links. 

The weight of a link Wij represents the number of direct transitions from node i to j. Similarly, 
the weight u>i of a node is given by the number of times a configuration has been visited. The 
weight distribution of CSN has been discussed in a previous work 

The degree of a node is defined as the number of links including loops (edges to itself), cor- 
responding to the number of configurations accessed in M steps during the dynamics. Because 
of finite-time simulations, the CSN is a directed network: if the system visited node j M steps 
after node i, the converse is not automatically true. Hence k m and /c oltt are not always equal. 



gura t 
13]. 



However, the asymmetry of the links is weak for two reasons. Firstly, the simulation is run long 
enough to almost ensure Wi—,j = Wj^i (which is in fact equivalent to detailed balance). Secondly, 
the total weight of the incoming links has to be equal to the total weight of the outgoing links by 
construction of the network. 

In the following, the degree of a node, ki, is defined as the out-degree k° ut . Similarly the average 
neighbor degree K nn (k) is the average out-degree of the neighbors of the nodes with degree k. 
The out-degree correlation between connected nodes is further characterized by the assortativity 
coefficient q 2g]. Finally the clustering coefficient is computed as the total number of 3-steps cycles 



(triangles) starting at node i (N^), divided by the maximum number of 3-steps cycles one can 
have in the considered graph: c, = , ou L in ■ 



III. ANALYTICAL APPROACH 



As already mentioned the ultimate goal is to understand the relation between the network 
topology and the free-energy landscape. Unfortunately the degree of a node for instance cannot 
be computed from the knowledge of the energy landscape for any M. However, large values of 
M correspond to a random sampling of the landscape (uncorrelated exploration). In this case an 
analytical approach can be carried out. The probability density on the free-energy landscape is 
given by W(x) = W exp(— U(x)). t7(x) (in k^T units) is the multi-dimensional energy potential 
and W the correct normalization. As shown further in this article, uncorrelated exploration of 
the free-energy landscape is relevant for several simulation procedures. In the space x, where 
D is the dimension of the landscape, the system configurations (i.e., CSN nodes) are defined as 
hyper-cubic cells of size aP . The probability to visit a configuration at xi at a given time is 
-P(xi) = a D Wo exp(— C/(xi)) and the expected number of times two configurations at position xi 
and X2 are visited consecutively is given by: 

W(xi,x 2 ) = W N e- u( - X3) - u( - Xl) (1) 

where Wjy = Na 2D W 2 , N is the total number of snapshots and a is chosen small enough, such 
that exp(— U(x)) is almost constant on each cell. The expression above predicts link weights. To 
compute the degree, the quantity of interest is the probability P(xi,X2) to have a link between 
two configurations (no matter how often the link has been visited). Assuming that the probability 
distribution of visiting s times the node at xi is peaked around its average value P(xi), P(xi,X2) 



is evaluated as one minus the probability to have no links: 

P( Xl ,x 2 ) = 1 - (1 - P( X2 )) NP ^ « 1 - e -m*x)Pte) ( 2 ) 

where the second equality holds in the limit of small P(x 2 ), which is true if the number of configu- 
rations is large. iVP(xi) is the expected number of times the configuration at xi has been visited 
and 1 — P(x 2 ) is the probability not to visit the configuration at x 2 . Eq. [2] is an approximation. 
An exact expression would require to sum the probability of visiting s times node xi multiplied by 
the probability of never visiting x 2 right after xi, i.e (1 — P(x 2 )) s , excluding the cases in which xi 
has been visited several times consecutively. However it is very difficult to express this in a simple 
form, and approximations are needed to proceed with further analytical calculations. From Eq. [21 
two asymptotic behaviors can be derived: 

1. If iVP(x 2 )P(xi) is large, then P(xi,x 2 ) ~ 1. This is the saturation regime since xi and x 2 
are almost certainly connected 

2. If iVP(x 2 )P(xi) is small, the sparse regime is reached which describes low probability con- 
nections. In this regime a n th expansion is meaningful: 

pW(x a ,x 2 ) = £ ± [ATP( Xl )P(x 2 )]^ (3) 
3=1 3 ' 

The first term in Eq. is equal to PF(xi,x 2 ) in Eq. [TJ Taking only the first term in the sum 
corresponds to the case in which links are distributed avoiding as much as possible the presence 
of double links. If iVP(x 2 )P(xi) <C 1 it provides a good approximation of the sum, while for 
iVP(x 2 )P(xi) ~ 1 it slightly overestimates the real probability to have a link between two nodes. 
Applying the considerations above, an approximation of P(xi,x 2 ) is given by: 

P»( Xl ,x 2 ) =min [l ) pW(x lj x a )] (4) 

Eq. H] defines a probability to have a link between two nodes, depending only on a parameter 
associated with each node (in this case the ener gy —U f x)). Such systems have been first described 



as the fitness model in [271 ] and generalized in 



28, 



291 ] (see Section VI). The degree of a node at 



x, its average neighbor degree and the expected number of triangles the node is part of are then 
given by the following three expressions: 

fc(x) = iy / d Xl P»(x, Xl ) (5) 

a Jv 



iT nn (x) = / dxitfWfoxxMxi) (6) 

iV A (x) = 4n/' / dx 1 dx 2 i?W(x,x 1 )i?W(x,x 2 )i?W(x ll x 2 ) (7) 
Finally, assuming continuous approximation, the degree distribution reads: 

P(k) ~ / d D x5(/t - jfe(x)) (8) 

Inverting Eq. [5] and inserting it into Eq. [6] and Eq. [7] gives the average neighbor connectivity 
K nn {k) and the clustering coefficient C(k), respectively: 

K nn (k) = -^j [ dx 1 £;(")(x(k),x 1 )A;(x 1 ) (9) 

= ^2^/ I dx 1 dx 2 ^")(x(k) ) x 1 )^)(x(k),x 2 )^")(x 1) x 2 ) (10) 

IV. QUADRATIC WELL 

In general the free-energy landscapes of real systems are extremely complex, so that even writing 
down a mathematical expression is often impossible. However, close to the minimum of a basin 
(corresponding to configurations visited several times), such systems can often be approximately 
described by means of a Taylor expansion of the potential, whose first term is harmonic. 

Therefore the quadratic well is a good benchmark to understand more complex CSNs 17(, in 



particular for nodes near the minimum of an energy basin. In 2 dimensions, the potential is given 
by: 

U{x,y)= l -(x 2 + y 2 ) = K 2 (11) 
Using radial coordinates and introducing Eq. [TT]in Eq. [3] with n = 1 gives (Wo = st): 
P«( Xl ,x 2 ) = l <* §^ = ex P (-i(r? + r 2 2 )) 

# rl = 2\n[0^-rl = B-rl (12) 

with B = 2 In ( ~\ . Hence a necessary condition for pW(xj,x 2 ) > 1 is that both r\ < \f~B~ and 
r2 < yB. The degree distribution is then obtained from Eq. [5] (see Appendix for the complete 



derivation) : 

{const if r < \J~B 44> k > 
^ (13) 
i ifr>/B^fc<|| 

Note that the flat tail for k > 2ir/a 2 is an artifact of the continuous approximation in D = 2. 
Analytical calculations, for instance in D = 4 where they are still manageable (but somehow 
tedious), show a decreasing behavior even for k > 2ir/a 2 . 

In the same way, the average neighbor connectivity is computed from Eq. [9] and the clustering 
coefficient from Eq. [TU1 Results are shown graphically on Fig. [1] (see Appendix for analytical 
calculations) . 

In order to compare the analytical predictions obtained above for n = 1 with the CSN obtained 
from simulations, a Langevin dynamics with potential energy defined by Eq. [11] is performed 
according to the equation of motion: 

dU e . , 

■n—sz+m 

where 7 is the friction coefficient and f(t) is a white noise with mean value < f(t) >= and 
< f(t)f{t') >= 5{t — t'). Without loss of generality 7 is set to one (it merely corresponds to a 
rescaling of the time scale) and the integration step to it = 0.001 in all simulations. In the case 
of the two-dimensional quadratic well, configurations are defined as square cells of size a 2 (a=0.2). 
A total number of N = 3 • 10 6 time steps has been employed for all simulations. 

The degree distribution P(k) for different values of the parameter M is shown in Fig. Q]A. The 
distribution follows a power-law of the form 1/k for values of the parameter M > 100. In this 
example, all the CSN realizations with M > 10 4 are equivalent to a random sampling of the energy 
landscape with probabilities given by W e~ u ^ (black points in the figure). Hence for these values 
of M the distribution follows the analytical prediction. 

In Figure [TJ3, the average neighbor connectivity K nn {k) is plotted for different values of M. 
There is a change in the behavior of K nn {k) as M increases. For low values of M K nn (k) is an 
increasing function of k, which indicates an assortative behavior. This is no longer true for large 
values of M. In this regime, K nn (k) shows a decaying tail typical of disassortative regime. For 
M > 10 4 the curves cannot be distinguished from the one obtained with uncorrelated sampling. 
The flat region for small k arises because nodes with small degree are likely to be connected with 
nodes with high degree which lay at the bottom of the basin. This phenomenon reflects that for 
large M, transitions starting at a node far from the minimum are likely to end up at the bottom 



of the basin, which is characterized by nodes with a large degree. On the other hand, for small 
values of M only neighbor configurations are visited consecutively. 

As already pointed out, the approximation n = 1 has the effect of slightly overestimating 
the node degree, which explains why results for the uncorrelated sampling are slightly below the 
analytical approximation. The calculation of the network assortativity coefficient q [26j] for different 
values of M reveals the same transition between assortative and disassortative regimes (see Fig. [2]). 
For M < 100 the network presents a strong assortativity characterized by values around q ~ 0.8. 
Increasing the value of M makes the assortativity coefficient drop to values smaller than —0.3 
indicating that the system has undergone an assortative-to-disassortative transition. 



Although models with changing assortativity have been recently proposed [30] this is the first 
time to our knowledge that a family of networks underlying the same physical process, i.e. diffusion 
in a well, presents this kind of transition. 

In the same way, the clustering coefficient C(k) exhibits different behaviors as a function of M. 
For M < 1000 the value of C(k) grows indicating that triangles easily form at the bottom of the 
basin. On the other hand, as M increases, C(k) shows a decaying tail for large values of the degree 
k. For M > 10 4 the C(k) obtained from the Langevin dynamics follows the analytical prediction 



The changing behavior of the CSN topology of a quadratic well for different values of the con- 
figuration saving time can be understood in a more general kinetic framework when considering 
relaxation times to a given configuration of the landscape. In Fig. [3A the distribution of the relax- 
ation times to the configuration laying at the bottom of the well for three different configurations 
is shown. Configurations at a small distance from the bottom node (small r) exhibit a downhill 
distribution, which is not the case for larger values of r. However as M increases, all distributions 
overlap (up to a global multiplicative factor) indicating that the kinetics to the bottom is the same 
for all configurations (see Fig. Ef3)- This corresponds to the uncorrelated regime of Eq. [TJ In this 
case, the probability to have a link between two configurations depends only on the configuration 
weight. 

V. NATIVE STATE OF A TRIPLE STRANDED /3-SHEET AND LATTICE 

HETEROPOLYMER 




of Eq. [13 



The analytical and numerical results obtained above are crucial for a correct interpretation of the 
CSN topology observed in complex systems for which direct application of Eq. [SlfTUI is unfeasible. 



In the following, the CSN topology of the native basin of a triple stranded /3-sheet peptide (beta3s) 
sampled by MD as well as of a lattice heteropolymer sampled by MC are investigated (see Fig. H] 
and El). 

The MD simulation of the native state of beta3s is performed at 270 K for a total of 10 ns of 
simulation time which is enough for the correct sampling of the basin. The low simulation temper- 
ature prevented the system from jumping to a different basin. The MD simulation is performed 
using the CHARMM PARAM19 force field [?J and an integration time step of it = 2 fs. A mean 
field approximation based on the solvent-accessible surface was used to describe the main effects of 



the aqueous solvent on the solute [3l|. The two parameters of the solvation model were optimized 
without using beta3s. The same force field and implicit solvent model have been used recently in 
MD simulations of various systems 



Secondary structure is calculated [35| for each snapshot saved along the MD trajectory. Here a 
configuration (i.e., a CSN node) is defined as a single string of secondary structure e.g., the most 
populated configuration for beta3s at 270 K (see inset of Fig. gj) is: -EEE-STTEEEEESSEEEE-. 
The total number of 5 x 10 6 snapshots sampled during the MD simulation resulted in 249 secondary 
structure configurations. There are eight possible letters in the secondary structure "alphabet": 
"H", "G", "I" , "E", "B", "T", "S" and "-" standing for a-helix, 3i helix, vr-helix, extended, 
isolated /3-bridge, hydrogen bonded turn, bend, and unstructured, respectively. Since the N and 
C-terminal residues are always assigned an "-" a 20 residue peptide can, in principle, assume 
8 18 ~ 10 16 configurations. 

The 2D lattice heteropolymer is simulated in the framework of the popular HP model 



36, 



37, 



38l . l39l . l40j ]. In this description, the amino acid sequence of a protein is represented as a binary 
sequence of hydrophobic (H) and polar (P) residues. The results presented here are obtained 
with the random sequence HHPHPPHHPPHHPH (inset of Fig. [5]). Note that similar results are 
observed for different HP sequences (random and protein-like) as well as for different numbers of 
beads, ranging from 10 to 20 (a detailed presentation of these results is under preparation). The 
time series of configurations is generated from moves of the polymer according to the Metropolis 
rule. It is important to mention that the qualitative observations do not depend on the set of 



41 



42l | have been tested). In the standard HP 



moves (local moves and the global "pivot" moves 
model, the energy of a configuration is merely minus the number of its H-H contacts on the lattice. 
From a physical point of view, the cornerstone of the simulation is the appropriate adjustment 
of T in order to achieve an effective sampling of the lowest energy configurations. This has been 
accomplished by sampling at a temperature significantly smaller than the coil-to-globule transition 



temperature of the polymer (i.e. kBT samp = 0.3 and kBT trans « 0.5). The transition temperature 
has been identified by a thorough study of the heat capacity Cy and of two topological quantities, 
namely the gyration radius and the end-to-end distance. A CSN node is defined as a single lattice 
configuration up to a symmetry of the lattice. 

In both the beta3s and heteropolymer systems two nodes are linked if a direct transition between 
them (at a given M) has been observed during the simulation. The topology of the two CSNs shows 
several common properties. The degree distributions P(k) for beta3s and the heteropolymer are 
shown in FigUA and[5]A, respectively. The distributions are robust upon varying the configuration 
saving time (i.e. changing the value of M) and resemble a power-law /c~ 7 for M > 1 with exponent 
7 between 1.5 and 2. This behavior is qualitatively similar to what is observed for the quadratic 
well while the steeper slope may result from the higher dimension (i.e., higher number of degrees 
of freedom) of the energy landscapes. Interestingly the average neighbor connectivity K nn {k) 
changes significantly for different values of M. K nn (k) is shown in Fig. Hf3 andOB. For M < 100 
this quantity grows with the degree, whereas for larger values of M, K nn (k) becomes a decreasing 
function of k. Moreover, the assortativity coefficient q changes from positive values for M = 1 to 
negative values for M > 1000 indicating an assortative-to-disassortative transition (see Fig. [2]). 

The clustering coefficient C(k) converges towards a general decreasing behavior for large M 
(Fig. H]0 and EC). In a previous work, the presence of an apparent scaling in C(k) had been 
interpreted as the signature of a hierarchical organization of the nodes in the native state of beta3s 
[lfj]. However, a comparison between the C(k) of beta3s for different values of M and of the 



quadratic we 
presented in 



1 (see Fig. [T]) strongly suggest that this decay does not indicate node hierarchy as 



211 ] . Firstly, the quadratic well underlying the CSN does not present a hierarchical 



organization like in 



2l| . Secondly, it should be noticed that in the uncorrelated regime nodes laying 



at the bottom of the basin are connected together, giving rise to an almost complete subgraph. In 
this regime, nodes with low degree are unlikely to be connected together but tend to connect to 
high degree nodes (bottom configurations). These two effects are indeed sufficient to explain the 
decay observed in C{k) without invoking the presence of a node hierarchy. 

For the CSN of beta3s there exists no rigorous evidence that for M > 100 an uncorrelated 
regime is reached. However, analysis of the transition probabilities (i.e. link weights) can account 
for this behavior. In Fig. [6] the relation between \og(wi/w\) and \og{wi-t\/w%- >i) is shown, where 
Wi and Wi-+x indicate the weight of node i and the weight of the link between node "i" and "1", 
respectively. Index 1 stands for the most populated node of the network. These logarithms have 
a physical meaning reflecting the configuration free-energy AFi ~ —ksTlog^Wi) and the free- 



energy barrier between different configurations AF^j ~ —kBT\og(wi—,j). For M = 1 the relation 
between the two free-energies is not linear. In other words, nodes with similar weights might be 
separated from node "1" by free-energy barriers of very different size (for instance the two nodes 
with AFi ~ 1 in Fig. Choosing higher M increases the correlation between node and link 
weights. For M = 10 5 , Ai^_>j grows linearly with AF{ indicating that link weights depend only 
on Wi. This behavior provides strong evidence for an uncorrelated sampling. 

It is essential to stress that the uncorrelated node regime is a frequent scenario when dealing with 
long sampling MD studies. These simulations explore transitions between several energy basins, 
for example, when investigating the large configurational changes characterizing protein folding. In 
these cases, the configuration saving time is usually set to large values for computational reasons 
resulting in an intra-basin uncorrelated regime. Finally, it is important to note that these results 
have been obtained for CSNs originating from a single basin energy landscape. In the case of 
networks describing fully sampled landscapes presenting a large number of basins the network 
topology reflects the contributions from different basins. 



VI. CONNECTION WITH FITNESS (HIDDEN VARIABLE) MODELS 

The scaling behavior observed in several networks has triggered a vast effort in modeling complex 
networks 22fl. Of particular interest for CSN is the model based on a fitness parameter on the 



nodes 




271. 128U29I]. In the original fitness model [27j] two nodes are connected with probability one if 



the sum of their fitness exceeds a given threshold. In the case of CSNs reflecting a single enthalpic 
energy basin (like in this work), the fitness of a node is given by —U(x). Eq. H]with n = 1 shows 
that nodes are connected with probability one if the sum of their fitness is higher than a threshold 
given by — ln(Wg a 2D N). In addition there is also a remaining probability to connect nodes of 
high energy, given by NP(x2)P(x-i). This formulation shows that CSNs fall in the large class of 



networks whose nodes are described by a fitness parameter, also referred to as hidden variable [281 ] . 
Notably, the behavior of P(k), K nn (k) and C{k) in the uncorrelated case is in good agreement 
with those obtained in the previous works mentioned above. 



VII. CONCLUSIONS 



The scaling behavior observed in the CSN topology has been investigated in the quadratic-well 
model, the native state of a triple stranded /3-sheet peptide and a lattice heteropolymer model. 



Three main results have clearly emerged. Firstly, in the limit of very large configuration saving 
times (uncorrelated regime) an analytical approximation (1 st order) for the degree distribution, the 
average neighbor connectivity and the clustering coefficient is provided. Comparison between the 
analytical predictions and the results obtained from the simulation of the dynamics in a quadratic 
well shows that, in the limit considered, the analytical solution describes correctly the CSN topol- 
ogy. These results allow for the interpretation of the topology observed in complex CSNs which 
cannot be tackled analytically, like the ones describing the native state of a /3-sheet peptide or the 
low-energy configurations of a lattice heteropolymer. Secondly, the variation of the configuration 
saving time induces remarkable changes in the CSN topology. For small saving times the network 
exhibits an assortative mixing. On the other hand, when increasing the saving time a disassortative 
behavior is observed. Given the same physical process (i.e. diffusion in a well), this result shows 
how the associated CSN changes its topological properties. Thirdly, the emergence of a decaying 
tail in the clustering coefficient, which had been suggested to bear the signature of a hierarchical 
organization of the nodes in the native state of the /3-sheet peptide, is in fact a consequence of 
uncorrelated sampling. 
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For the case of the quadratic well in D = 2, the derivation of the degree distribution (Eq. fl~3|) 
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Appendix 



is performed by first calculating the degree of a node at distance r. If r < \^B, Eq. Oreads: 




(14) 



If r > \/B, Eq. [5] reads: 



fe (r) = % r 

a 2 Jo 



ridr] 



g 4 iV 



r e 2 



(r 2 +r 2 ) 



a iV i„2 



2vr 



-e 2' 



(15) 



Eq. [8] is calculated using the properties of the 5(f(r)) function. For a given function /(r) with n 
simple zeros f(r*) = 0, / (r*) 7^ 0,i — 1 « ?f ; n nraciKio +^ „7t-^ q - xftfvW — ^1 r » ) 

r* is given by inverting Eq. [HI and[T5l 



1 ... re, it is possible to write: S(f(r)) = Ya=i \f'(r*)\ Hence 



1. if r < \/B~ <^ k > % : 



2(1 + f,-^ 
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2. if r < \fB ^ k < ^: 
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The average neighbor connectivity is calculated using Eq. [9l 



K nn [k) 



27r 2 r,l 1 27T/1 1 B 2 



\k + \e~ x ~^\e^ if k>H 



D 7T 1 47T 13 



if A; < 25 



(16) 



The expression for the clustering coefficient is slightly more complex since it requires to distinguish 
between several cases according to the possible values of r. Of particular interest is the case of 
large k (i.e. small r). If r < J k > |f (1 + j), solving the integral of Eq. Ogives: 
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FIG. 1: (Color online) Network topology for the quadratic well in D = 2 dimensions and different values 
of the parameter M. (A) Degree distribution. For clarity a binning has been applied for M > 10. Inset: 
degree distribution for uncorrelated sampling without binning. (B) Average neighbor degree. (C) Clustering 
coefficient. Black dots are obtained by a random sampling of the energy landscape (uncorrelated case, see 
text). Red dashed- lines shows the analytical estimation. 




FIG. 2: (Color online) Assortativity mixing coefficient for different values of the parameter M for the three 
systems under study. 




FIG. 3: (Color online) Distribution of the relaxation times from three different initial configurations to the 
bottom configuration of the quadratic well in D = 2 dimensions (A) for M = 1 and (B) for M = 1000. The 
value r indicate the radial distance from the starting node for each of the three curves. 




FIG. 4: (Color online) Network topology for the beta3s peptide CSN at different values of the parameter 
M. (A) Degree distribution. To reduce noise a logarithmic binning has been applied. The native state of 
beta3s is shown in the inset. (B) Average neighbor connectivity. (C) Clustering coefficient. 




FIG. 5: (Color online) Network topology for the random lattice heteropolymer CSN at different values of 
the parameter M. (A) Degree distribution. The most visited configuration of the heteropolymer is shown 
in the inset. (B) Average neighbor connectivity. (C) Clustering coefficient. 




FIG. 6: (Color online) Relation between the free-energy barrier to the configuration at the bottom of the 
native state basin (Af^i), and the configuration free energy (AFj) for the most visited nodes of the beta3s 
network. Empty and full dots represent the M = 1 and M — 10 5 case, respectively. 



